Goto

Collaborating Authors

 Municipality of Maribor



Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions

Guan, Shaowei, Kwok, Hin Chi, Law, Ngai Fong, Stiglic, Gregor, Qin, Harry, Hui, Vivian

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) has rapidly emerged as a transformative approach for integrating large language models into clinical and biomedical workflows. However, privacy risks, such as protected health information (PHI) exposure, remain inconsistently mitigated. This review provides a thorough analysis of the current landscape of RAG applications in healthcare, including (i) sensitive data type across clinical scenarios, (ii) the associated privacy risks, (iii) current and emerging data-privacy protection mechanisms and (iv) future direction for patient data privacy protection. We synthesize 23 articles on RAG applications in healthcare and systematically analyze privacy challenges through a pipeline-structured framework encompassing data storage, transmission, retrieval and generation stages, delineating potential failure modes, their underlying causes in threat models and system mechanisms, and their practical implications. Building on this analysis, we critically review 17 articles on privacy-preserving strategies for RAG systems. Our evaluation reveals critical gaps, including insufficient clinical validation, absence of standardized evaluation frameworks, and lack of automated assessment tools. We propose actionable directions based on these limitations and conclude with a call to action. This review provides researchers and practitioners with a structured framework for understanding privacy vulnerabilities in healthcare RAG and offers a roadmap toward developing systems that achieve both clinical effectiveness and robust privacy preservation.



Bayesian Distributional Models of Executive Functioning

Kasumba, Robert, Lu, Zeyu, Marticorena, Dom CP, Zhong, Mingyang, Beggs, Paul, Pahor, Anja, Ramani, Geetha, Goffney, Imani, Jaeggi, Susanne M, Seitz, Aaron R, Gardner, Jacob R, Barbour, Dennis L

arXiv.org Artificial Intelligence

This study uses controlled simulations with known ground-truth parameters to evaluate how Distributional Latent Variable Models (DLVM) and Bayesian Distributional Active LEarning (DALE) perform in comparison to conventional Independent Maximum Likelihood Estimation (IMLE). DLVM integrates observations across multiple executive function tasks and individuals, allowing parameter estimation even under sparse or incomplete data conditions. DLVM consistently outperformed IMLE, especially under with smaller amounts of data, and converges faster to highly accurate estimates of the true distributions. In a second set of analyses, DALE adaptively guided sampling to maximize information gain, outperforming random sampling and fixed test batteries, particularly within the first 80 trials. These findings establish the advantages of combining DLVM's cross-task inference with DALE's optimal adaptive sampling, providing a principled basis for more efficient cognitive assessments.


Enhancing Cryptocurrency Sentiment Analysis with Multimodal Features

Liu, Chenghao, Mahanti, Aniket, Naha, Ranesh, Wang, Guanghao, Sbai, Erwann

arXiv.org Artificial Intelligence

As cryptocurrencies gain popularity, the digital asset marketplace becomes increasingly significant. Understanding social media signals offers valuable insights into investor sentiment and market dynamics. Prior research has predominantly focused on text-based platforms such as Twitter. However, video content remains underexplored, despite potentially containing richer emotional and contextual sentiment that is not fully captured by text alone. In this study, we present a multimodal analysis comparing TikTok and Twitter sentiment, using large language models to extract insights from both video and text data. We investigate the dynamic dependencies and spillover effects between social media sentiment and cryptocurrency market indicators. Our results reveal that TikTok's video-based sentiment significantly influences speculative assets and short-term market trends, while Twitter's text-based sentiment aligns more closely with long-term dynamics. Notably, the integration of cross-platform sentiment signals improves forecasting accuracy by up to 20%.


Standardized Multi-Layer Tissue Maps for Enhanced Artificial Intelligence Integration and Search in Large-Scale Whole Slide Image Archives

Fiala, Gernot, Plass, Markus, Harb, Robert, Regitnig, Peter, Skok, Kristijan, Zoughbi, Wael Al, Zerner, Carmen, Torke, Paul, Kargl, Michaela, Müller, Heimo, Brazdil, Tomas, Gallo, Matej, Kubín, Jaroslav, Stoklasa, Roman, Nenutil, Rudolf, Zerbe, Norman, Holzinger, Andreas, Holub, Petr

arXiv.org Artificial Intelligence

A Whole Slide Image (WSI) is a high-resolution digital image created by scanning an entire glass slide containing a biological specimen, such as tissue sections or cell samples, at multiple magnifications. These images can be viewed, analyzed, shared digitally, and are used today for Artificial Intelligence (AI) algorithm development. WSIs are used in a variety of fields, including pathology for diagnosing diseases and oncology for cancer research. They are also utilized in neurology, veterinary medicine, hematology, microbiology, dermatology, pharmacology, toxicology, immunology, and forensic science. When assembling cohorts for the training or validation of an AI algorithm, it is essential to know what is present on such a WSI. However, there is currently no standard for this metadata, so such selection has mainly been done through manual inspection, which is not suitable for large collections with several million objects. We propose a general framework to generate a 2D index map for WSI and a profiling mechanism for specific application domains. We demonstrate this approach in the field of clinical pathology, using common syntax and semantics to achieve interoperability between different catalogs. Our approach augments each WSI collection with a detailed tissue map that provides fine-grained information about the WSI content. The tissue map is organized into three layers: source, tissue type, and pathological alterations, with each layer assigning segments of the WSI to specific classes. We illustrate the advantages and applicability of the proposed standard through specific examples in WSI catalogs, Machine Learning (ML), and graph-based WSI representations.


Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction

Samimi, Reza, Bhattacharya, Aditya, Gosak, Lucija, Stiglic, Gregor, Verbert, Katrien

arXiv.org Artificial Intelligence

Healthcare professionals need effective ways to use, understand, and validate AI-driven clinical decision support systems. Existing systems face two key limitations: complex visualizations and a lack of grounding in scientific evidence. We present an integrated decision support system that combines interactive visualizations with a conversational agent to explain diabetes risk assessments. We propose a hybrid prompt handling approach combining fine-tuned language models for analytical queries with general Large Language Models (LLMs) for broader medical questions, a methodology for grounding AI explanations in scientific evidence, and a feature range analysis technique to support deeper understanding of feature contributions. We conducted a mixed-methods study with 30 healthcare professionals and found that the conversational interactions helped healthcare professionals build a clear understanding of model assessments, while the integration of scientific evidence calibrated trust in the system's decisions. Most participants reported that the system supported both patient risk evaluation and recommendation.


Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability

Chen, Yuxin, Tang, Peng, Qiu, Weidong, Li, Shujun

arXiv.org Artificial Intelligence

Privacy policies are widely used by digital services and often required for legal purposes. Many machine learning based classifiers have been developed to automate detection of different concepts in a given privacy policy, which can help facilitate other automated tasks such as producing a more reader-friendly summary and detecting legal compliance issues. Despite the successful applications of large language models (LLMs) to many NLP tasks in various domains, there is very little work studying the use of LLMs for automated privacy policy analysis, therefore, if and how LLMs can help automate privacy policy analysis remains under-explored. To fill this research gap, we conducted a comprehensive evaluation of LLM-based privacy policy concept classifiers, employing both prompt engineering and LoRA (low-rank adaptation) fine-tuning, on four state-of-the-art (SOTA) privacy policy corpora and taxonomies. Our experimental results demonstrated that combining prompt engineering and fine-tuning can make LLM-based classifiers outperform other SOTA methods, \emph{significantly} and \emph{consistently} across privacy policy corpora/taxonomies and concepts. Furthermore, we evaluated the explainability of the LLM-based classifiers using three metrics: completeness, logicality, and comprehensibility. For all three metrics, a score exceeding 91.1\% was observed in our evaluation, indicating that LLMs are not only useful to improve the classification performance, but also to enhance the explainability of detection results.


UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text

Kocbek, Primoz, Kopitar, Leon, Zhang, Zhihong, Aydin, Emirhan, Topaz, Maxim, Stiglic, Gregor

arXiv.org Artificial Intelligence

This paper describes our submissions to the TREC 2024 PLABA track with the aim to simplify biomedical abstracts for a K8 - level audience (13 - 14 years old students). We tested three approaches using OpenAI's gpt - 4o and gpt - 4o - mini models: baseline prompt engineering, a two - AI agent approach, and fine - tuning. Adaptations were evaluated using qualitative metrics ( 5 - point Likert scales for simplicity, accuracy, completeness, and brevity) and quantitative readability scores (Flesch - Kincaid grade level, SMOG Index). Results indicate d that the two - agent approach and baseline prompt engineering with gpt - 4o - mini models show superior qualitative performance, while fine - tuned models excelled in accuracy and completeness but were less simple. The evaluation results demonstrated that prompt engineering with gpt - 4o - mini outperforms iterative improvement strategies via two - agent approach as well as fine - tuning with gpt - 4o. We intend to expand our investigation of the results and explore advanced evaluations.


IEEEICM25: "A High-Performance Disturbance Observer"

Sariyildiz, Emre

arXiv.org Artificial Intelligence

This paper proposes a novel Disturbance Observer, termed the High-Performance Disturbance Observer, which achieves more accurate disturbance estimation compared to the conventional disturbance observer, thereby delivering significant improvements in robustness and performance for motion control systems.